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1. INTRODUCTION 

Personal identification is essential in many systems’ safety and security, such as in a building or 
financial deposit system. Technology developments by hackers often follow the development of security 
systems. It gives its motivation so that security technology continues to be developed. One of the identification 
systems still developing is the biometric-based identification system [1]. Some biometric systems capable of 
being used in the identification process are fingerprint, face, iris, speech, and finger vein [2], [3]. 

The finger vein-based biometric method has some advantages in the security system. The process is 
suitable for identifying the authentication system that needs high accuracy and security. This finger vein image 
pattern will differ when a person is alive or has died, making it more challenging to fake. The finger vein 
images were acquired with tools utilizing a near-infrared camera system. Image acquirement uses a different 
contrast principle because of the deoxygenizing process on the vein's blood flow. The blood absorbs more 
infrared radiation than the area around it. Therefore, the finger veins will be darker than in other areas. The 
difference can be further processed and finally classified using artificial intelligence. 

Several artificial intelligence methods have been implemented in real problems [4]—[8]. One artificial 
intelligence application identifies and verifies finger vein images [9]—[13]. Several available algorithms can be 
implemented from the deep learning method, such as convolutional neural networks (CNN) [14]-[16]. In 
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feature recognition, CNN has an advantage in its performance and accuracy. However, CNN generally has a 
disadvantage because it considers spatial information in the feature extraction process [17]. When several 
images have the same value in a particular area, it can increase the difficulty level of pattern recognition. CNN 
will assume that the images are identical, although different [18], [19]. This paper proposes a finger vein 
identification system using capsule networks with hyperparameter tuning to overcome the problem. We 
optimized the capsule architectures and routing iterations at the capsule layer to find the best capsule network 
system to identify the finger vein images. 

The rest of this paper is described as follows. First, the research method is described in detail in 
section 2. Then, section 3 describes the experimental results, including the training and testing of the finger 
vein identification system using capsule networks with hyperparameter tuning. Finally, in section 4, the 
conclusion of this study is offered. 


2. RESEARCH METHOD 

This paper shows the finger vein image identification system using capsule networks with 
hyperparameter tuning, as shown in Figure 1. The system consists of data preprocessing and capsule networks 
with hyperparameter tuning. We used the capsule networks with hyperparameter tuning as the classifier. 


Finger vein I Data Capsule I Identity of Finger 
images I| preprocessing Networks I vein owner 
I 


Figure 1. Finger vein images identification system using capsule networks with hyperparameter tuning 


2.1. Data collection and preprocessing 

We used finger vein digital images from a dataset of the Shandong University machine learning and 
applications-homologous multimodal traits SDUMLA-HMT) [20], [21]. The data was divided into three parts: 
training set, validation set, and test set. Each class has six image data. We split the data of each type into three 
sections, i.e., four images for training, one for validation, and two for evaluation [19], [20]. This data 
preparation was carried out to prevent overfitting [22]. This research increased the training set by varying the 
data images by rotation and translation transformations. 

The finger veins images of one finger are chosen for each subject. The number of images is 636, 
with 106 classes. The finger vein images were preprocessed in two stages: the extraction of the region of 
interest (ROD) and image enhancement using contrast limited adaptive histogram equalization (CLAHE) 
method [23]-[25]. Figure 2 shows the ROI extraction process. ROI extraction was conducted by determining 
the exact area where the finger vein images were taken. The site is obtained by determining the location of the 
finger edge contour. Cutting the ROI area produces an image of 180x100 pixels. This ROI extraction is 
conducted to reduce class variations caused by image capture errors. 


Raw Image Area of vem ROI 


Figure 2. ROI extraction process [21] 


The CLAHE method is implemented using open-source computer vision (OpenCV) library. OpenCV 
is a library of many tools provided for dynamic image processing by Intel [26], [27]. CLAHE improves the 
adaptive histogram equalization by applying the clip limit on the histogram to decrease the possibility of 
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contrast. Figure 3 shows an example of the input and output of finger vein image contrast enhancement using 
CLAHE. 
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Figure 3. Example of the input and output of finger vein image contrast enhancement using CLAHE [21] 


2.2. Design of capsule networks with hyperparameter tuning 

This research implements capsule networks as the finger vein identification system [28], [29]. We 
used several libraries in the Python language, such as tensor flow, Numpy, Pandas, Keras, and Scikit-learn 
[22], [30]-[33]. The hyperparameter variations in the model are the augmentation, the parameters of the capsule 
layer, and the convolution layer. There are variations in capsule architectures and routing iterations at the 
capsule layer. The convolution layer is used as an image feature extraction. We varied the number of 
convolution layers and other parameters such as stride, kernel size, and input image size. 

The model was trained using Google Colaboratory [34]. Training on each variation is carried out to 
get the best variation in each layer. The better model architecture will be rearranged into a new model using 
the best parameters for each variation. Each training stage is optimized using an Adam optimizer with a learning 
rate of 0.001 and evaluated using a margin loss function. 


3. RESULTS AND DISCUSSION 

The finger vein identification system's results and performance are explained as follows. We present 
the performance of the baseline model, the image processing with ROI Extraction and CLAHE, and the 
augmentation effect. We also offer the model's performance with routing iterations, the variation of the capsule 
layer architecture, and the convolution layer variation. 


3.1. Baseline model 

The baseline model used as a reference was based on what Sabour et al. [35] and Hinton et al. [36] 
proposed. The accuracy obtained from the model training shows that the baseline model cannot recognize all 
images correctly. It can be improved by giving the batch normalization function to the convolution layer in the 
model [37]. Batch normalization is used to normalize the input in the activation values by normalization of the 
mean and variance values. The baseline model performance after the batch normalization is shown in 
Figure 4. The model using batch normalization increases the accuracy of the baseline model. 
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Figure 4. The accuracy of the baseline model using a batch normalization [21] 
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3.2. Image processing with ROI extraction and CLAHE 

In this research, the image processing conducted using ROI extraction and CLAHE increases the 
model accuracy. Data with image processing has more apparent and more precise features. It makes it easier 
for the model to recognize the features. The test is conducted on two images representing each class. The 
system will choose the highest value as the predicted output of the system. 


3.3. Augmentation effect 

The augmentation of the training set is conducted to decrease overfitting because of the relatively 
small number of the training set. The augmentation uses the principle that each image batch is augmented for 
each epoch so that each trained data will have a new variation for each epoch. We augmented the finger vein 
images with rotation and translation transformations. The process has been performed with an angle of 5 
degrees. The translation transformation is undertaken with five percent of the image's horizontal and vertical 
position differences. The augmentation effects on the training loss of the model are shown in Figure 5. 
Although achieving minimal error was a little slow, capsule network were successfully trained even by using 
data augmentation in the training phase. 
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Figure 5. Training loss of capsule network model of finger vein identification with augmentation variation 


3.4. Routing iterations and capsule layer architecture 

This paper optimizes the number of routing iterations and the capsule layer architecture. The term 
routing in capsule networks can be found in several references [38]—[40]. In this paper, the identification system 
uses three variations in the number of routing iterations at the capsule layer, i.e., three, four, and five. Figure 6 
shows the effects of several routing iterations on the validation loss. The system achieved the highest accuracy 
with four routing iterations. Another variation of the capsule layer is its architecture. The capsule architecture 
in the baseline is eight capsules in the primary capsule layer and 16 capsules in the digitcaps layer. Table | 
shows the variation of the capsule architecture. 

As for the digitcaps layer, the number of computational complexities is influenced by the number of 
classification classes, vector dimensions, and input matrix channels of the primary capsule layer. The effect of 
the capsule architecture variation on the validation accuracy of the model is shown in Figure 7. The accuracy 
of the interpretation of the capsule architecture gets the same results for architectures | and 2. Still, the accuracy 
convergence on the baseline architecture occurs faster, and accuracy with architecture 2 is more challenging to 
achieve stability. Optimal accuracy is obtained from the model that uses capsule base dimensions, namely eight 
capsule dimensions on the main capsule layer and 16 vector dimensions on digitcaps with three routing 
iterations. 


3.5. Convolution layer variation 

The number of computational complexities in the convolution layer depends on several factors, 
including Kernal size, the number of features, stride number, and image size. Variations in the convolution 
layer focus more on the influence of the kernel, stride, and number of layers. Table 2 shows the convolution 
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model of the capsule network's finger vein identification system. Kernel and stride size may affect the feature 
extraction. The smaller the number of strides, the better the feature extraction is. Thus, the number of strides 
in the variation is chosen, limiting the number of computational complexities still capable of hardware 
computing. The sizes of the kernel are 3x3, 5x5, and 9x9. The larger the kernel size, the higher the 
computational parameters needed. The effect of the convolution layer variation on the validation accuracy is 
shown in Figure 8. The convolution model 3 achieves an accuracy of 91.25%, the highest in the experiments. 
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Figure 6. The effect of the number of routing iterations on the validation loss [21] 


Table 1. Architecture type of capsule layer of the finger vein identification system 


No Architecture type #Primary capsule layer #Digitcaps layer 
1 Baseline architecture 8 16 
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Figure 7. The effect of capsule architecture on the validation accuracy of the model [21] 
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Table 2. Convolution model of the finger vein identification system using capsule networks 


Convolution layer model Convolution layer Specification 
Baseline Model First Layer Nine kernels, one stride, ReLu 

Convolution Model | First Layer 3 kernels, 2 strides, ReLu, Batch Normalization 
Second Layer 3 kernels, 2 strides, ReLu, Batch Normalization 
Third Layer 3 kernels, 2 strides, ReLu, Batch Normalization 
Convolution Model 2 First Layer 9 kernels, 2 strides, ReLu, Batch Normalization 
Second Layer 5 kernels, 2 strides, ReLu, Batch Normalization 

Third Layer 3 kernels, | stride, ReLu, Batch Normalization 

Convolution Model 3 First Layer 3 kernels, | stride, ReLu, Batch Normalization 

Second Layer 3 kernels, | stride, ReLu, Batch Normalization 
Third Layer 3 kernels, 2 strides, ReLu, Batch Normalization 
Fourth Layer 3 kernels, 2 strides, ReLu, Batch Normalization 
Fifth Layer 3 kernels, 2 strides, ReLu, Batch Normalization 
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Figure 8. The effect of the convolution model type on the validation accuracy [21] 


4. CONCLUSION 

This paper describes a finger vein identification system as a security system. Hyperparameter tuning 
was carried out on the capsule networks, including variations in the capsule networks’ architecture and the 
convolution layer. The number of routing iterations and image preprocessing was also investigated. The 
capsule network's finger vein identification system achieved an accuracy of 91.25% using the SDUMLA-HMT 
dataset. 
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